The Unreasonable Effectiveness of Deep Features as a Perceptual Metric
نویسندگان
چکیده
While it is nearly effortless for humans to quickly assess the perceptual similarity between two images, the underlying processes are thought to be quite complex. Despite this, the most widely used perceptual metrics today, such as PSNR and SSIM, are simple, shallow functions, and fail to account for many nuances of human perception. Recently, the deep learning community has found that features of the VGG network trained on the ImageNet classification task has been remarkably useful as a training loss for image synthesis. But how perceptual are these so-called “perceptual losses”? What elements are critical for their success? To answer these questions, we introduce a new Full Reference Image Quality Assessment (FR-IQA) dataset of perceptual human judgments, orders of magnitude larger than previous datasets. We systematically evaluate deep features across different architectures and tasks and compare them with classic metrics. We find that deep features outperform all previous metrics by huge margins. More surprisingly, this result is not restricted to ImageNet-trained VGG features, but holds across different deep architectures and levels of supervision (supervised, self-supervised, or even unsupervised). Our results suggest that perceptual similarity is an emergent property shared across deep visual representations. 1. Motivation The ability to compare data items is perhaps the most fundamental operation underlying all of computing. In many areas of computer science it does not pose much difficulty: one can use Hamming distance to compare binary patterns, edit distance to compare text files, Euclidean distance to compare vectors, etc. The unique challenge of computer vision is that even this seemingly simple task of comparing visual patterns remains a wide-open problem. Not only are visual patterns very high-dimensional and highly correlated, but, the very notion of visual similarity is often subjective, aiming to mimic human visual perception. For instance, in image compression, the goal is for the compressed image to be indistinguishable from the original by a human observer, irrespective of the fact that their pixel representations might be very different. Classic per-pixel measures, such as the `2 Euclidean distance metric, commonly used for regression problems, or the related Peak Signal-to-Noise Ratio (PSNR), are insufficient for assessing structured outputs such as images, since they assume each output pixel is conditionally independent of all others, given the input. A well-known example is that blurring an image causes large perceptual but small Euclidean change. What we would really like is a “perceptual distance”, which measures how similar are two images in a way that coincides with human judgment. This problem, often 1 ar X iv :1 80 1. 03 92 4v 1 [ cs .C V ] 1 1 Ja n 20 18 Original Perturbed Patches (a) Traditional Original Perturbed Patches
منابع مشابه
The effect of bilateral subthalamic nucleus deep brain stimulation (STN-DBS) on the acoustic and prosodic features in patients with Parkinson’s disease: A study protocol for the first trial on Iranian patients
Background: The effect of subthalamic nucleus deep brain stimulation (STN-DBS) on the voice features in Parkinson’s disease (PD) is controversial. No study has evaluated the voice features of PD underwent STN-DBS by the acoustic, perceptual, and patient-based assessments comprehensively. Furthermore, there is no study to investigate prosodic features before and after DBS in PD. The curren...
متن کاملSpeech Emotion Recognition Using Scalogram Based Deep Structure
Speech Emotion Recognition (SER) is an important part of speech-based Human-Computer Interface (HCI) applications. Previous SER methods rely on the extraction of features and training an appropriate classifier. However, most of those features can be affected by emotionally irrelevant factors such as gender, speaking styles and environment. Here, an SER method has been proposed based on a concat...
متن کاملThe Effectiveness of Rebound Therapy on Improving Perceptual Visual Coordination and Social Development of Students with Learning Disabilities
Objective: The purpose of this study was to investigate the effectiveness of rebound therapy on improving perceptual visual coordination and social development of students with learning disabilities. Method: The method of this research is applied. To do this, the semi-experimental research method was used using pre-test and post-test with the control group. The statistical society consisted of ...
متن کاملComparison of Effectiveness of Motor-Working Memory Training and Perceptual-Motor Exercises on Digit Span and Letter–Number Sequencing in Educable Children with Intellectual Disabilities
Background and Objective: Appropriate programs should be provided to improve the function of memory, learning, and the effects of processing efficiency in the daily life of children with intellectual disabilities. Therefore, the present study aimed to compare the effectiveness of motor-working memory training and perceptual-motor exercises on digit span and letter-number sequencing in educable ...
متن کاملImage authentication using LBP-based perceptual image hashing
Feature extraction is a main step in all perceptual image hashing schemes in which robust features will led to better results in perceptual robustness. Simplicity, discriminative power, computational efficiency and robustness to illumination changes are counted as distinguished properties of Local Binary Pattern features. In this paper, we investigate the use of local binary patterns for percep...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1801.03924 شماره
صفحات -
تاریخ انتشار 2018